Motivation
Four general principles
Case study
Costs and benefits
25 May 2016
Motivation
Four general principles
Case study
Costs and benefits
Motivation
Universalism
'Communism'
Disinterestedness
Organized skepticism
Robert Boyle's vacuum pump
Documentation
'Communal witnessing'
Circumstances
Empirical Reproducibility
Computational Reproducibility
Statistical Reproducibility
Computational Biology
Computational Physics
Computational Chemistry
Computational Economics
Computational …
Reproducibilty is necessary for scientific progress
Computers wrangle all the data, but also obscure it
Especially point-and-click actions
Technical solutions available in open source/format/data/access
Four general principles of reproducible research that have emerged in other fields
✓ Plain text file formats
✓ persistent URLs
Victoria Stodden's Reproducible Research Standard
✓ Data: CC-0 (public domain)
✓ Code: MIT (no liability for reuse)
✓ Text/Figures/Media: CC-BY (attribution required)
✗ Mouse gestures leave few traces that are enduring and accessible to others
✗ Easy to lose track of ah hoc changes in mouse-driven environments
✓ Everything should be scripted: data ingest, cleaning, analysis, visualizing, and reporting
✓ Scripts create a very high-resolution record of the research workflow in a plain text file that can be reused and inspected by others
✗ Managing different versions of computer files is very challenging
✗ Poor version control leads to loosing track of the provenance of results
✓ VCS designed for software engineering are suitable for research code and text
✓ Commit history preserves a high-resolution, transparent record of the development of a file or set of files
✓ Enables remote collaborators to work together without overwriting each other’s work
✗ Minor changes in software can cripple complex research pipelines
✗ Managing software dependencies is tedious
✓ List of the key pieces software and their version numbers
✓ Archive a self-contained computational environment like a virtual machine or Linux container
Case Study
All files on figshare.com
Data in CSV format
Organised as an R package
R & Rmarkdown documents
All files tracked with Git, hosted on GitHub
Collaboration did not occur on GitHub because no co-authors used it
Docker image and Dockerfile to contain RStudio, packages, code and external dependencies
Based on Rocker image and templates
.travis.yml
circle.yml
README.md
R package & manuscript
VCS repository
code CI
environment CI
Costs & benefits
Time learning the tools
A lot of time
Built-in vs Bolt-on
Comfort of knowing that I am right & have no secrets
Save time by reusing my previous code
Open data confers citation advantages, but magnitude is highly variable
Open Source community membership provides access to high-quality help
Presentation written in R Markdown using ioslides
Compiled into HTML5 using RStudio & knitr
Source code hosting: https://github.com/benmarwick/
ORCID: http://orcid.org/0000-0001-7879-4531
Licensing: